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TECHNICAL MEMORANDUM 


A BROWSE FACILITY FOR EARTH SCIENCE REMOTE SENSING DATA-- 
CENTER DIRECTOR’S DISCRETIONARY FUND FINAL REPORT 

(PROJECT 91-09) 

.4 

INTRODUCTION 

The objective of the proposed research was to develop an image data browse facility for 
remotely sensed earth science data sets and to subsequently allow that and ancillary data to be 
extracted from an on-line data archive. This work was inspired by the video browse data facility 
of the Coastal Zone Color Scanner at the NASA GSFC (Feldman et al., 1989.) Large quantities of 
remotely sensed data that consist of observations from radar and satellite systems are already 
available to the scientist. By the end of the decade, the data rate shall increase by about 1000 
times that of the current satellite image data rate. In order for the scientist to more fiilly under- 
stand which data were available for study, we developed an ability to visually browse the archived 
image data and create movie loops of the data if so desired. Due to the wealth of information that 
will be available from planned sensor systems, the scientist requires an ability to screen potential 
data sets which may be of scientific applicability for his research. The hope was that this screen- 
ing of data would allow the researcher to more rapidly assess whether a particular case study is 
worth pursuing. Once a decision was made to utilize the event in question, the scientist would be 
allowed to extract the data from the on-line archive. 

INITIAL APPROACH 

Initial thoughts for this research involved the use of personal computers and laser video 
disk hardware. Figure 1 depicts two possible solutions that were examined. The option in Figure 
la consists of a personal computer (PC), PC monitor, NTSC video monitor, video disk player/ 
recorder and associated cabling. Figure lb is a slightly modified version of Figure la in that video 
output would coexist on the PC monitor. A long-range goal of the research was to have a server 
device maintain the video browse data and distribute it to individual users via the local area net- 
work. This approach would require the X-AMndows 11 system from MIT and support for the TCP/ 
IP communications protocol. 

Most remotely sensed data are stored in a digital format. However, it tends to be of an 
unmanageable size and in a raw format. For example, polar orbiting satellite data may consist of 
orbital swaths with a domain of 128 pixels wide by 3000 pixels along the path of motion. There 
are approximately fifteen such passes per day, covering different geographic regions. In order to 
produce a video browse product, the data needed to be remapped to a common projection and 
have all orbital passes for one day combined. In order to fit within the NTSC television format, 
the image sizes could not exceed 480 lines by 640 elements. This required the data to be sub-sam- 
pled (reduced in spatial resolution) in order to fit within these boundaries. For the purposes of this 
study, we used data from the WetNet project (Goodman et al., 1990.) These data were chosen due 
to their availability and extensive use by NASA personnel. A sample image is given in Figure 2. 



a 



NTSC 




Figure 1. Two potential hardware configurations depicting (a) separate con- 
trol and video screens and (b) control and video on the same mon- 
itor. 
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Figure 2. Polar orbiter image data from the WetNet project. Black regions 
denote inter-orbital swaths and missing data. 


Once the scientist has determined which image data are desired, he should have the option 
of extracting these data from the archival system. By linking the image browse facility with a 
Relational Data Base Management System (RDBMS), the researcher would be able to determine 
which ancillary data may be useful for the investigation. The Interactive Data Integration and 
Management System (IDIMS), developed for the Earth Science and Applications Division by the 
University of Alabama in Huntsville, is a menu-driven system which allows catalog browsing and 
retrieval of earth science data sets (Graves et al., 1990.) IDIMS manages and integrates data with 
an object-oriented approach that provides an interface between an RDBMS, the host computer 
operating system, storage management systems, and other software systems or algorithms. The 
RDBMS provides the capability to inter-relate data sets and algorithms from different scientific 
disciplines. The object-oriented user interface provides the flexibility and extensibility required 
for an evolving system to adapt to scientific needs. IDIMS was incapable of pictorially presenting 
data to the scientist. The researcher had to first extract data from the archive to determine whether 
a case study was worth pursuing. The integration of the image browse selection software with the 
IDIMS capability provides the scientist a better understanding of the data and allows him to 
extract all pertinent data from the data archive. 

The first phase of the project was to determine the feasibility of using laser video disk 
technology as the basis for the visual browse facility. It became apparent as we explored the laser 
video disk technology that it was a media which would greatly inhibit development of a visual 
toowse and data archive access facility. Several factors were part of that decision: 

• Cost - writable laser video disk systems are very expensive ($30,000). 

• Software interfaces would be difficult to develop and would be device dependent. 

• Usage would be limited to one user at a time. 

• User would have to physically relocate themselves. 

• Could not readily be accessed via computers on networks. 

• Data would be analog. 
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• Several pieces of video hardware would be necessary to support the system ($15,000). 

• Media and media format were in a state of flux. 

We then learned of new technology which was becoming available from Intel. A system, 
called DVT (Digital Mdeo Interactive), appeared to be very promising. At the time, DVI consisted 
of a set of computer boards which needed to be interfaced into a PC. The approach of DVI is to 
compress video data in real-time or through a production level (send your data to a lab) compres- 
sion technique. The data would subsequently be stored on the system's magnetic hard drive. Upon 
playback, these data could then be decompressed in real-time by the board set. It would be able to 
support still frame, as well as full-motion video. Compression ratios were indicated to be highest 
for the full-motion video compressed in the Intel labs. Compression of still frame data by the user 
was on the order of 5 to 1 with data loss. Other forms of loss-less data compression perform as 
well if not better. 

There were advantages and disadvantages to using this approach, and they are as follows: 
Advantages: 

• Runs on personal computers. 

• Multimedia is a potential boom industry. 

• Board sets will be incorporated into the motherboards, and by the year 2000, into the CPU 

itself. 

• Less expensive than laser video disk. 

Disadvantages: 

• Does not run on UNIX platforms. 

• Incompatible with UNIX, OS/2, and Microsoft Windows. 

• Currently costs $1,000 to $2,000 per PC for hardware. 

• Compression technique results in user-uncontrollable data loss. 

• Full-motion video uses very low level disk access. 

• Does not allow for networked application. 

• Does not allow for ordinary disk access while playing back video. 

• Development software costs $5,000. 

• DVI uses only half the video scan lines. 

• Image sizes are restricted by the DVI system. 

The above disadvantages of using DVI greatly outweighed the advantages of this technol- 
ogy. These limitations may, however, be overcome in the next few years as the technology 
advances. Based on this, we opted not to utilize this technology for the visual browse facility. 

SOFTWARE APPROACH 

Based on the limitations of the previously described approaches, it was determined that a 
software solution to the problem was in order. This approach appeared favorable in that it would 
be an application which would be portable across a variety of platforms. Also, we would be able 
to utilize a file server device and computer networks. We decided to develop the graphical envi- 
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ronment based on industry standards. The two standards we decided to incorporate were the X- 
Windows 11 system developed by MIT and the Motif Style Guide for Graphical User Interfaces 
(GUI.) This system was chosen due to its wide use on a variety of vendor platforms, and because 
it was a protocol definition rather than a vendor-specific implementation. We also decided to use 
the UNIX operating system as the basis of the development effort. These decisions were made 
based on industry acceptance of Xll and Motif, and the Government decision to purchase sys- 
tems which incorporate an open systems approach. The open systems approach is also attractive 
in that the code should be more portable, when fully developed, and easier to maintain. We also 
decided that the facility should be written using the C programming language, since the language 
has proven to allow for highly portable code. 

Using the X- Window system version 11 as the basis for our visual display system, we 
were able to display images which are larger than NTSC television resolution. This was supported 
through resizing and scrolling capabilities. The TCP/IP communications protocol was utilized to 
pass the visual browse information from the browse data server machine to the scientist’s work- 
station. This means that the process is broken into a client-server problem, with the server main- 
taining the data base of browse images, and the client performing the display and movie loop 
capabilities. The client process is also responsible for making queries to the RDBMS. 

Figure 3 shows a conceptual view of the modules involved in the visual browse software 
system. As can be seen, a graphical user interface is the overall control facility of the system, with 
three major components. These components are the relational data base management system, the 
geographic region selector (map viewer), and the visual browse viewer / selector. The map viewer 
allows one to outline a region of interest on either a map of North America or a map of the world. 
The comer points of this region are then used as the basis of an initial query for the RDBMS. The 
scientist then chooses other characteristics of interest from the RDBMS, and images are then 
returned to the visual browse viewer. Working with this tool, the scientist is able to then select 
those images (data sets) of interest. When completed, he requests the RDBMS to extract the 
actual data sets from the archival system. 



Figure 3. Functional elements of the visual browse facility. 


The IDIMS RDBMS was originally developed by UAH on an MSFC institutional IBM 
3090 mainframe computer, and continues to support the Earth Science and Applications Division 
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(ESAD) data inventory. The original IDIMS is a text-oriented menu system. Extensions were 
desired to allow the study of data set selection methods, GUI techniques, and data inventory orga- 
nization approaches. As a result of this research a new IDIMS prototype, which runs on the UNIX 
operating system using the X Windows system 11, user interface was begun. Two major efforts of 
the UNIX IDIMS system were data set selection techniques and visual cueing approaches. 

The new UNIX IDIMS allows the user to search on any of the metadata parameters avail- 
able under the existing mainframe IDIMS. In addition, UNIX IDIMS allows for a metadata inveil- 
tory which contains additional information about data files, including geographic coordinates, 
location, and parameter keywords. Search fields may be specified in any order. The search will not 
be restricted on any field that is left unspecified. After the user has specified search criteria, he 
may submit a query to the ESAD inventory. A window will pop up, displaying the files that 
matched his search criteria, along with information about the data stored in each file and its loca- 
tion. The system can also store previously defined search parameters, which can be reloaded 
whenever the scientist wants to execute a common query. 

Prior to development of actual GUI screens, sample layouts were produced on paper and 
presented to ESAD scientists for review. Using the resulting suggestions, initial GUI screens were 
developed on the UNIX workstation and once again presented to Division scientists. Through this 
iterative approach, a system of GUI windows was designed and prototyped. The result was a 
graphical tool which allowed scientists to enter many different search criteria at a single screen, 
without having to navigate through many menu levels (screens.) By working closely with the sci- 
entists, we were able to produce an interface which better matched their desires and would 
decrease their data set selection time. 

UNIX IDIMS also features, via the map viewer, rectilinear and polar map projections 
which allow specification of geographical areas of interest through simple mouse manipulations. 
This feature provides scientists with a quick natural technique (drawing a box) for defining an 
area of interest. UNIX IDIMS will automatically place the latitude and longitude ranges into the 
data base query. A zoom function will enlarge geographic areas so that a more precise selection 
can be made. Figure 4 depicts portions of the browse screen with the region selection map. The 
map viewer was implemented to handle rectilinear maps of the world and North America. Options 
for views of the north and south poles also were available. For the rectangular maps, a rectangular 
rubber-band region was provided to select the four comers of interest. In the polar case, one 
selected latitudinal rings by clicking a mouse button at a chosen latitude, then dragging to a new 
latitude. This mbber-band donut chose all longitudes for the latitudinal band selected. In order to 
assist in the selection process, reference grid lines were included as dashed lines on the maps (Fig. 

4 ). 


After the query portion of the selection has been completed, the list of browse images is 
extracted from the browse data base server and sent to the browse image viewer. Figure 5 depicts 
the browse viewer window with a sample browse image. As can be seen, the browse images are 
displayed in a scrollable window. These browse images allow one to further refine the selection 
process by allowing one to examine the data to determine if it is of interest for further study. For 
example, as can be seen in Figure 5, data are missing for the eastern United States. If the scientist 
is interested in studying this region, the browse image has already informed him that data are not 
present. The delete frame button would then be chosen, and the reference to the archived data 
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Figure 4. Portions of the query and region selection windows for the UNIX IDIMS facil- 
ity. Note the selected region on the map and the corresponding latitudes and 
longitudes in the query panel. 


related to this browse image would be removed from the archive extraction list of the RDBMS. 


Another aspect of the browse image viewer window is that one may play continuous 
movie loops of the image sequences, step forward or backward, or zoom in or out. The ‘‘files” but- 
ton allows one to delete image frames from the play selection and any references associated with 
that image you have made to the archived data. Once you have the images in which you are inter- 
ested, the “retrieve files” option sends the modified query list back to the RDBMS, which subse- 
quently extracts the full resolution data from the archival system, as well as any ancillary data 
requested. 


A special data format was developed to handle the browse images. This format incorpo- 
rated an 140-byte header, followed by the compressed stream of image data. The header included 
such information as day, time, latitude/longitude bounds, compression technique, and a short ver- 
bal description of the data. The compression techniques are described in the next section. This 
format differed from that used to store the archived full resolution data. In retrospect, the data for- 
mat should have conformed to a self descriptive data format such as the Hierarchical Data Format 
(HDF) developed by the National Center for Supercomputing Applications (NCSA, 1990.) The 
HDF format would have been more useful, due to the current availability of analysis tools 
designed to accept this format. It also is available for a wide variety of hardware platforms from 
UNIX workstations to personal computers. The one drawback of using HDF is that we would not 
have had the flexibility of the type of data compression performed on the data set. 
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Figure 5. Example of the image browser pop-up window. 


DATA COMPRESSION 


In order to reduce the transmission time of the image data across the Internet, as well as 
reduce the total data storage requirements, we explored several data compression techniques to 
determine their usefulness. These techniques included run length encoding (RLE), subsampling, a 
modified Lempel-Ziv technique (Welch, 1984) (UNIX compress utility), and an artificial neural 
network approach using a counterpropagation network (Hecht-Nielsen, 1987, 1988.) We desired a 
lossless type of data compression technique or, if not possible, we wanted control over the amount 
of information lost due to data compression. For instance, subsampling loses data, but the remain- 
ing data is left intact as far as its actual values are concerned. The neural network approach is a 
lossy type of data compression, and after testing was discounted due to the extremely long train- 
ing times involved. We determined that RLE, while a highly used technique, was not practical for 
the variety and type of data sets which we would use. We might actually in some cases increase 
the size of the data stream stored and transmitted due to the nature of the technique. Subsampling 
was an attractive alternative in that we were able to transmit to the scientist essentially all the 
information necessary for data browse. For this study, a hybrid approach was utilized. Input 
images were first subsampled, removing every other pixel; this was followed by a Lempel-Ziv 
compression. This yielded images which were 320 by 160 pixels (51,200 bytes.) The original 
image size was 204,800 bytes. We also performed further compression by reducing the pixels to 
four bits per picture element, thus storing two pixels per 8 bits. It should be noted that the run- 
length encoding technique used 12 bytes for header information, with 140 bytes used for the Lem- 
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pel-Ziv technique. Table 1 shows results from the data compression study which incorporated the 
subsampling and storage of two pixels per 8 bits. It is clear from these results that the Lempel-Ziv 
algorithm achieved data compression rates twice those of the run-length encoding. Results from 
the artificial neural network study, not shown, indicate compression ratios of 300:1 for similar 
images. However, as stated earlier, artificial neural networks do a lossy type of data compression. 
Another aspect of the neural network approach was the amount of time spent in training just one 
image. These times ranged from 10 to 30 minutes. In order to faithfully represent the input data 
patterns one would require perhaps hundreds of images in the training set and many hours of com- 
putational resources. Image reconstruction, however, is fast. Another attribute of the artificial neu- 
ral network technique used in this study is that it is an averaging technique and, if not careful, one 
will obtain a statistical average of the entire data set for any chosen browse image. 


Table 1: Comparison of compression techniques 


File# 

Run Length 
File Size 

Run Length 
Compression 
Ratio 
(320x160) 

Lempel-Ziv 
File Size 

Lempel-Ziv 

Compression 

Ratio 

(320x160) 

1 

17235 

3:1 

8236 

6:1 

2 

19783 

3:1 

11117 

5:1 

3 

17796 

3:1 

9370 

5:1 

4 

19430 

3:1 

11944 

4:1 

5 

17364 

3:1 

9133 

6:1 

6 

16624 

3:1 

9006 

6:1 

7 

19043 

3:1 

11829 

4:1 


CONCLUSION 


It is clear from this study that a software-based approach using standards is clearly the 
most cost effective and beneficial. Proprietary hardware solutions are not only costly, but prohibi- 
tive in their use and availability. The UNIX operating system, X- Windows user interface, and 
TCP/IP network protocols provided a mature and popular set of standards on which to base devel- 
opment of an IDIMS GUI prototype. UNIX IDIMS was designed to allow the use of more meta- 
data for cataloging scientific data sets, to be more flexible and easier to navigate than the original 
IDIMS, and to allow the user to browse reduced resolution versions of image data before deciding 
which files to retrieve. 

The data compression study indicated that the modified Lempel-Ziv algorithm outper- 
formed run-length encoding techniques. While the counterpropagation artificial neural network is 
a lossy type of data compression, it still warrants further study. The efforts of this study have 
allowed MSFC researchers to obtain funding to assist in the generation of the Information Man- 
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agement System browse facility for the Version 0 Earth Observation System Data and Informa- 
tion System, Distributed Active Archive Center Facility. 
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